translated by 谷歌翻译
自我定位是一种基本功能,移动机器人导航系统集成到使用地图从一个点转移到另一点。因此,任何提高本地化精度的增强对于执行精致的灵活性任务至关重要。本文描述了一个新的位置,该位置使用Monte Carlo定位(MCL)算法维护几个颗粒人群,始终选择最佳的粒子作为系统的输出。作为新颖性,我们的工作包括一种多尺度匹配匹配算法,以创建新的MCL群体和一个确定最可靠的指标。它还贡献了最新的实现,从错误的估计或未知的初始位置增加了恢复时间。在与NAV2完全集成的模块中评估了所提出的方法,并与当前的最新自适应ACML溶液进行了比较,从而获得了良好的精度和恢复时间。
translated by 谷歌翻译
社会互动网络是建立文明的基材。通常,我们与我们喜欢的人建立新的纽带,或者认为通过第三方的干预,我们的关系损害了。尽管它们的重要性和这些过程对我们的生活产生的巨大影响,但对它们的定量科学理解仍处于起步阶段,这主要是由于很难收集大量的社交网络数据集,包括个人属性。在这项工作中,我们对13所学校的真实社交网络进行了彻底的研究,其中3,000多名学生和60,000名宣布正面关系和负面关系,包括对所有学生的个人特征的测试。我们引入了一个度量标准 - “三合会影响”,该指标衡量了最近的邻居在其接触关系中的影响。我们使用神经网络来预测关系,并根据他们的个人属性或三合会的影响来提取两个学生是朋友或敌人的可能性。或者,我们可以使用网络结构的高维嵌入来预测关系。值得注意的是,三合会影响(一个简单的一维度量)在预测两个学生之间的关系方面达到了最高的准确性。我们假设从神经网络中提取的概率 - 三合会影响的功能和学生的个性 - 控制真实社交网络的演变,为这些系统的定量研究开辟了新的途径。
translated by 谷歌翻译
糖尿病性视网膜病变(DR)是发达国家工人衰老人群中失明的主要原因之一,这是由于糖尿病的副作用降低了视网膜的血液供应。深度神经网络已被广泛用于自动化系统中,以在眼底图像上进行DR分类。但是,这些模型需要大量带注释的图像。在医疗领域,专家的注释昂贵,乏味且耗时。结果,提供了有限数量的注释图像。本文提出了一种半监督的方法,该方法利用未标记的图像和标记的图像来训练一种检测糖尿病性视网膜病的模型。提出的方法通过自我监督的学习使用无监督的预告片,然后使用一小部分标记的图像和知识蒸馏来监督微调,以提高分类任务的性能。在Eyepacs测试和Messidor-2数据集中评估了此方法,仅使用2%的Eyepacs列车标记图像,分别使用0.94和0.89 AUC。
translated by 谷歌翻译
由于它们过去证明的准确性较低,因此对3D摄像机进行步态分析的使用受到了高度质疑。本文介绍的研究的目的是提高机器人安装在人体步态分析中的估计的准确性,通过应用监督的学习阶段。 3D摄像头安装在移动机器人中,以获得更长的步行距离。这项研究表明,通过使用从认证的Vicon系统获得的数据训练的人工神经网络对相机的原始估计进行后处理,从而改善了运动步态信号和步态描述符的检测。为此,招募了37名健康参与者,并使用ORBBEC ASTRA 3D摄像头收集了207个步态序列的数据。有两种基本的训练方法:使用运动学步态信号并使用步态描述符。前者试图通过减少误差并增加相对于Vicon系统的相关性来改善运动步态信号的波形。第二个是一种更直接的方法,专注于直接使用步态描述符训练人工神经网络。在训练之前和之后测量了3D摄像头的精度。在两种训练方法中,都观察到了改进。运动步态信号显示出较低的错误和相对于地面真理的较高相关性。检测步态描述符的系统的准确性也显示出很大的改进,主要是运动学描述符,而不是时空。在比较两种训练方法时,不可能定义哪个是绝对最好的。因此,我们认为,培训方法的选择将取决于要进行的研究的目的。这项研究揭示了3D摄像机的巨大潜力,并鼓励研究界继续探索他们在步态分析中的使用。
translated by 谷歌翻译
近年来,已经开发了时间序列异常检测算法的特定评估指标来处理经典精度和召回的局限性。但是,这样的指标是作为多个理想方面的总体构建的,引入参数并消除输出的解释性。在本文中,我们首先强调了经典精度/召回的局限性,以及最近基于事件的指标的主要问题 - 例如,我们表明,对手算法可以达到高精度和几乎所有数据集中的回忆在虚弱的假设下。为了应对上述问题,我们根据基于地面真相和预测集之间的``隶属关系''的概念提出了理论上扎根,健壮,无参数和可解释的扩展到精确/回忆指标。我们的指标利用了地面真理和预测之间持续时间的衡量标准,因此具有直观的解释。通过与随机抽样的进一步比较,我们获得了归一化的精度/召回,从而量化了给定的结果一组比随机基线预测更好。通过构造,我们的方法使有关地面真理事件的本地评估保持了本地,从而实现了细粒度的可视化和算法结果的解释。我们将建议与各种公共时间序列检测数据集,算法和指标进行比较。我们进一步得出了隶属指标的理论特性,这些属性给出了对其行为的明确期望,并确保针对对手策略的稳健性。
translated by 谷歌翻译
In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10{\deg} even in scenarios with a reverberation time $T_{60}$ of 1.5 s.
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Ithaca is a Fuzzy Logic (FL) plugin for developing artificial intelligence systems within the Unity game engine. Its goal is to provide an intuitive and natural way to build advanced artificial intelligence systems, making the implementation of such a system faster and more affordable. The software is made up by a C\# framework and an Application Programming Interface (API) for writing inference systems, as well as a set of tools for graphic development and debugging. Additionally, a Fuzzy Control Language (FCL) parser is provided in order to import systems previously defined using this standard.
translated by 谷歌翻译